HDBSCAN clustering for identification of states with lignin dimers bound to \beta-cyclodextrin from molecular dynamics simulations
Author
Brian Novak
Introduction
In our previous study1, we investigated the interaction between \beta-cyclodextrin and lignin dimer derivatives through experimental methods, molecular dynamics simulations using GROMACS 2018.32,3, and docking with AutoDock Vina4–6. The relevant chemical structures are depicted in Figure 1. An introduction to the project7 and a summary of the publication8 are available. During the molecular dynamics simulations, we observed multiple types of dimer-cyclodextrin bound states. We estimated the proportions of these states in unbiased simulations using the following procedure:
Computed a large number of collective variables including angle between lignin dimer and \beta-cyclodextrin principal axes, distances between atoms in the \beta-cyclodextrin molecule to atoms in the lignin dimer molecule, and lignin dihedral angles
Applied principal component analysis (PCA)9 to reduce the number of dimensions to two
Clustered the trajectory configurations using Density-Based Spatial Clustering of Applications with Noise (DBSCAN)10,11 to separate the different states into distinct clusters
Counted the number of points in each cluster and computed proportions of each one
The procedure details can be found in the Supporting Information for the previous study1 on pages 9-18. The primary challenge was encountered in step 3, where the states were not distinctly separated. Trial and error was necessary to select DBSCAN parameters that successfully differentiated the clusters.
Figure 1: Structures of lignin dimer derivatives (1-3), and \beta-cyclodextrin. The lower right structures are 3D top and side views of \beta-cyclodextrin. The 3D representations show \beta-cyclodextrin from top and side views, with hydrogen atoms in white, carbon atoms in cyan, and oxygen atoms in red. The face of \beta-cyclodextrin with two hydroxyl (-OH) groups per unit (seen in the top view) is referred to as the secondary face, while the other face is referred to as the primary face. RDKit12 2023.03.3 was used for the 2D representations and VMD13 1.9.314 was used to create the 3D representations.
Code
"""The files related to this figure are in ./000_Attachments/fig-dimers_BCD_labelled.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from pathlib import Pathimport subprocessimport numpy as npfrom rdkit import Chemfrom rdkit.Chem import Drawimport warnings# Suppress UserWarningwarnings.filterwarnings("ignore", category=UserWarning)# Generate dimer structuressmi_dir ="./000_Attachments/fig-dimers_BCD_labelled"smi_list = [Path(smi_dir, smi) for smi in ['GG.smi', 'TGG.smi', 'GGBB.smi']]rdkit_list = [Chem.MolFromSmiles(np.loadtxt(smi, dtype=str)[0]) for smi in smi_list]dimers_img = Draw.MolsToGridImage(rdkit_list, molsPerRow=3, subImgSize=(400, 200))outfile = Path(smi_dir, "dimers.png")withopen(outfile, "wb") as png: png.write(dimers_img.data)subprocess.run(f"mogrify -trim {outfile}", shell=True) # ImageMagick# Generate cyclodextrin 2D structuresmi = Path(smi_dir, "cyclodextrin.smi")rdkit = Chem.MolFromSmiles(np.loadtxt(smi, dtype=str)[0])outfile = Path(smi_dir, "BCD.png")img = Draw.MolToFile(rdkit, outfile, size=(500, 500))subprocess.run(f"mogrify -trim {outfile}", shell=True)# A cyclodextrin PDB file, BCD.pdb, is included.# VMD structures were created manually (top.tga, side.tga).# Structures were combined into one image (dimers_BCD.png) manually# Label the image with all structuresscript_file = Path(smi_dir, "structures_labelling.sh")subprocess.run(f"bash {script_file}{smi_dir}", shell=True);
In this study, three carefully selected collective variables were utilized to distinguish the observed states, eliminating the need for dimensionality reduction. Additionally, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN)15–17 was used instead of DBSCAN for clustering. Finding the point where the number of clusters as a function of the min_samples = min_cluster_size parameter for HDBSCAN started to never increase provided a way to identify a “natural” number of clusters (see Determination of min_samples for details on the algorithm). Additionally, this eliminated the trial and error that was previously required to separate the clusters using PCA and DBSCAN.1 The analysis uncovered configuration types with the lignin dimer bound to the cyclodextrin center that were not evident through mere visual inspection of the trajectories. However, these were subsets of the visually observed types, and only one cluster was dominant for each group of clusters corresponding to the visually observed types.
Methods
Collective variables
Three collective variables were defined based on important configurations seen in unbiased simulations. The collective variables were computed using PLUMED18 2.6.619.
Normal distance
The first collective variable was a signed distance from the cyclodextrin to the lignin dimer along the direction of a vector pointing from the cyclodextrin center through the secondary face.
\overrightarrow{CL} is the vector pointing from the cyclodextrin center of mass (COM) to the lignin COM and \overrightarrow{CS} is the vector pointing from the cyclodextrin COM to the center of the hydroxyl oxygen atoms in the cyclodextrin secondary face.
Code
"""The files related to this figure are in ./000_Attachments/fig-CS_vector.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import MDAnalysis as mdaimport py3Dmol# PDB file namebcd_pdb ="000_Attachments/fig-CS_vector/BCD.pdb"# Load the PDB fileu = mda.Universe(bcd_pdb)# Select the atoms in the cyclodextrincyclodextrin = u.select_atoms("resname BCDC")# Calculate the COM of the cyclodextrincom_cyclodextrin = cyclodextrin.center_of_mass()# Select the oxygen atoms in the secondary face (O2 and O3)oxygen_atoms = cyclodextrin.select_atoms("name O2 O3")# Calculate the COM of the oxygen atomscom_oxygen = oxygen_atoms.center_of_mass()# Calculate the CS vectorcs_vector = com_oxygen - com_cyclodextrin# Create a py3Dmol viewview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representationview.addModel(open(bcd_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3}, "stick": {}})# Add the CS vectorview.addArrow( {"start": {"x": com_cyclodextrin[0], "y": com_cyclodextrin[1], "z": com_cyclodextrin[2]},"end": {"x": com_oxygen[0], "y": com_oxygen[1], "z": com_oxygen[2]},"radius": 0.2,"color": "green", })# Label the CS vector with an offsetlabel_offset = [0.3, 0, 0]view.addLabel("CS", {"position": {"x": com_cyclodextrin[0] + cs_vector[0] + label_offset[0],"y": com_cyclodextrin[1] + cs_vector[1] + label_offset[1],"z": com_cyclodextrin[2] + cs_vector[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the moleculeview.setBackgroundColor("white")view.zoomTo()# Rotate the view to show the CS vectorview.rotate(-70, "x")# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 2: \overrightarrow{CS} represented as a green arrow. In the cyclodextrin structure, gray atoms are carbon, red atoms are oxygen, and hydrogen atoms are not shown.
Code
"""The files related to this figure are in ./000_Attachments/fig-CL_vector.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import numpy as npimport MDAnalysis as mdaimport py3Dmol# PDB file namesbcd_pdb ="000_Attachments/fig-CL_vector/BCD.pdb"lignin_pdb ="000_Attachments/fig-CL_vector/GG.pdb"# Load the PDB file for ligninu_lignin = mda.Universe(lignin_pdb)# Select the lignin moleculelignin = u_lignin.select_atoms("all")# Calculate the COM of the lignin moleculecom_lignin = lignin.center_of_mass()# Load the PDB file for cyclodextrinu_cyclodextrin = mda.Universe(bcd_pdb)# Select the cyclodextrin moleculecyclodextrin = u_cyclodextrin.select_atoms("all")# Calculate the COM of the cyclodextrin moleculecom_cyclodextrin = cyclodextrin.center_of_mass()# Calculate the CL vectorcl_vector = com_lignin - com_cyclodextrin# Recalculate the CL vectorcom_lignin = lignin.center_of_mass()cl_vector = com_lignin - com_cyclodextrin# Create a py3Dmol viewview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representationview.addModel(open(bcd_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3}, "stick": {}})# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add the CL vectorview.addArrow( {"start": {"x": com_cyclodextrin[0], "y": com_cyclodextrin[1], "z": com_cyclodextrin[2]},"end": {"x": com_lignin[0], "y": com_lignin[1], "z": com_lignin[2]},"radius": 0.2,"color": "green", })# Label the CL vectorlabel_offset = [-1.2, 0.4, 0]view.addLabel("CL", {"position": {"x": com_cyclodextrin[0] + label_offset[0],"y": com_cyclodextrin[1] + label_offset[1],"z": com_cyclodextrin[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the cyclodextrin moleculelabel_offset = [0, 8.8, 0]view.addLabel("cyclodextrin", {"position": {"x": com_cyclodextrin[0] + label_offset[0],"y": com_cyclodextrin[1] + label_offset[1],"z": com_cyclodextrin[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the lignin moleculelabel_offset = [0.5, 5, 0]view.addLabel("lignin dimer 1", {"position": {"x": com_lignin[0] + label_offset[0],"y": com_lignin[1] + label_offset[1],"z": com_lignin[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Zoom in a bitview.zoom(1.4)# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 3: \overrightarrow{CL} represented as a green arrow. In the molecular structures, gray atoms are carbon, red atoms are oxygen, and hydrogen atoms are not shown.
Code
"""The files related to this figure are in ./000_Attachments/fig-dnorm.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import py3Dmolimport numpy as npimport MDAnalysis as mda# Load PDB files with mdanalysiscyclodextrin_pdb ="000_Attachments/fig-dnorm/BCD.pdb"lignin_pdb ="000_Attachments/fig-dnorm/GG.pdb"cyclodextrin = mda.Universe(cyclodextrin_pdb)lignin = mda.Universe(lignin_pdb)# Calculate centers of masscyclodextrin_com = cyclodextrin.atoms.center_of_mass()oxygen_com = cyclodextrin.select_atoms("name O2 O3").center_of_mass()lignin_com = lignin.atoms.center_of_mass()# Calculate vectorsCS_vector = oxygen_com - cyclodextrin_comCL_vector = lignin_com - cyclodextrin_com# Calculate dnormdnorm = np.dot(CS_vector, CL_vector) / np.linalg.norm(CS_vector)# Create py3Dmol visualizationview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representation. Make it translucent.view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add green cylinder to represent dnormstart = cyclodextrin_comend = cyclodextrin_com + dnorm * (CS_vector / np.linalg.norm(CS_vector))view.addCylinder( {"start": {"x": start[0], "y": start[1], "z": start[2]},"end": {"x": end[0], "y": end[1], "z": end[2]},"radius": 0.2,"color": "green", })# Label dnorm at the center of the cylinder with an offset.# Use a white background, black text, and a font size of 20.cylinder_center = (start + end) /2label_offset = [0.25, 0, 0]view.addLabel("dₙ", {"position": {"x": cylinder_center[0] + label_offset[0],"y": cylinder_center[1] + label_offset[1],"z": cylinder_center[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the cyclodextrin moleculelabel_offset = [0, 0, 3.8]view.addLabel("cyclodextrin", {"position": {"x": cyclodextrin_com[0] + label_offset[0],"y": cyclodextrin_com[1] + label_offset[1],"z": cyclodextrin_com[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label the lignin moleculelabel_offset = [-1.65, 0, 3.9]view.addLabel("lignin dimer 1", {"position": {"x": lignin_com[0] + label_offset[0],"y": lignin_com[1] + label_offset[1],"z": lignin_com[2] + label_offset[2], },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate the view to see the dnorm vectorview.rotate(90, "x")# Zoom in a bitview.zoom(1.1)view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 4: The signed distance from the cyclodextrin to the lignin dimer along \overrightarrow{CS}, d_n, shown as a green cylinder. In the molecular structures, gray atoms are carbon, red atoms are oxygen, cyclodextrin is translucent, and hydrogen atoms are not shown.
Tangential distance
The second collective variable was the distance from the cyclodextrin COM to the lignin dimer COM in the direction perpendicular to \overrightarrow{CS}. It is the length of the second leg of a right triangle with hypotenuse |\overrightarrow{CL}| and one leg d_n.
"""The files related to this figure are in ./000_Attachments/fig-dtang.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import MDAnalysis as mdaimport numpy as npimport py3Dmol# PDB file pathscyclodextrin_pdb ="000_Attachments/fig-dtang/BCD.pdb"lignin_pdb ="000_Attachments/fig-dtang/GG.pdb"# Load the PDB filesu_cyclodextrin = mda.Universe(cyclodextrin_pdb)u_lignin = mda.Universe(lignin_pdb)# Calculate the COM for cyclodextrin and lignin dimercom_cyclodextrin = u_cyclodextrin.atoms.center_of_mass()com_lignin = u_lignin.atoms.center_of_mass()# Calculate the COM for oxygen atoms in cyclodextrin secondary faceoxygen_atoms = u_cyclodextrin.select_atoms("name O2 O3")com_oxygen = oxygen_atoms.center_of_mass()# Calculate the vectorsCS_vector = com_oxygen - com_cyclodextrinCL_vector = com_lignin - com_cyclodextrin# Calculate the projection of CL_vector onto the plane perpendicular to CS_vectorCS_unit_vector = CS_vector / np.linalg.norm(CS_vector)projection = np.dot(CL_vector, CS_unit_vector) * CS_unit_vectordtang_vector = CL_vector - projectiondtang = np.linalg.norm(dtang_vector)# Create the py3Dmol visualizationview = py3Dmol.view()view.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")view.addModel(open(lignin_pdb, "r").read(), "pdb")# Set CPK representation for both molecules. Make the cyclodextrin translucent.view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add a green cylinder to represent dtangstart = com_cyclodextrinend = com_cyclodextrin + dtang_vectorview.addCylinder( {"start": {"x": float(start[0]), "y": float(start[1]), "z": float(start[2])},"end": {"x": float(end[0]), "y": float(end[1]), "z": float(end[2])},"radius": 0.2,"color": "green", })# Label dtanglabel_offset = [-5, 0, 0.3]view.addLabel("dₜ", {"position": {"x": float(end[0] + label_offset[0]),"y": float(end[1] + label_offset[1]),"z": float(end[2] + label_offset[2]), },"backgroundColor": "white","fontColor": "black","fontSize": 20, },)# Label cyclodextrinlabel_offset = [-4, 0, 3.8]view.addLabel("cyclodextrin", {"position": {"x": float(com_cyclodextrin[0] + label_offset[0]),"y": float(com_cyclodextrin[1] + label_offset[1]),"z": float(com_cyclodextrin[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label ligninlabel_offset = [-1.7, 0, 3.8]view.addLabel("lignin dimer 1", {"position": {"x": float(com_lignin[0] + label_offset[0]),"y": float(com_lignin[1] + label_offset[1]),"z": float(com_lignin[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate the view to show dtangview.rotate(90, "x")view.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 5: The distance from the cyclodextrin COM to the lignin dimer COM in the direction perpendicular to \overrightarrow{CS}, d_t, represented as a green cylinder. In the molecular structures, gray atoms are carbon, red atoms are oxygen, cyclodextrin is translucent, and hydrogen atoms are not shown.
Relative orientation
The third collective variable was the cosine of the angle between the vector from the center of the ring atoms in the lignin head to the center of the ring atoms in the lignin tail (\overrightarrow{HT}) and \overrightarrow{CS}.
"""The files related to this figure are in ./000_Attachments/fig-HT_vector.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import py3Dmolimport numpy as npimport MDAnalysis as mda# Load PDB file with MDAnalysislignin_pdb ="000_Attachments/fig-HT_vector/GG.pdb"lignin = mda.Universe(lignin_pdb)# Lignin head and tail positions for HT vectorlignin_head = lignin.select_atoms("resid 1 and name C1 C2 C3 C4 C5 C6").center_of_mass()lignin_tail = lignin.select_atoms("resid 2 and name C1 C2 C3 C4 C5 C6").center_of_mass()# Create py3Dmol visualizationview = py3Dmol.view(width=800, height=600)# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})# Add green arrow to represent HT vectorview.addArrow( {"start": {"x": float(lignin_head[0]),"y": float(lignin_head[1]),"z": float(lignin_head[2]), },"end": {"x": float(lignin_tail[0]),"y": float(lignin_tail[1]),"z": float(lignin_tail[2]), },"radius": 0.2,"color": "green", })# Label HTlabel_offset = [1, 0, 0]view.addLabel("HT", {"position": {"x": float(lignin_tail[0] + label_offset[0]),"y": float(lignin_tail[1] + label_offset[1]),"z": float(lignin_tail[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label ligninlignin_com = lignin.atoms.center_of_mass()label_offset = [-3, 0, 6]view.addLabel("lignin dimer 1", {"position": {"x": float(lignin_com[0] + label_offset[0]),"y": float(lignin_com[1] + label_offset[1]),"z": float(lignin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate for better view of the HT vectorview.rotate(-90, "x")view.rotate(-30, "z")# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 6: The lignin head to tail vector, \overrightarrow{HT}, represented as a green arrow. In the translucent lignin structure, gray atoms are carbon, red atoms are oxygen, and hydrogen atoms are not shown.
Code
"""The files related to this figure are in ./000_Attachments/fig-theta.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import py3Dmolimport numpy as npimport MDAnalysis as mda# Load PDB files with MDAnalysiscyclodextrin_pdb ="000_Attachments/fig-theta/BCD.pdb"lignin_pdb ="000_Attachments/fig-theta/GG.pdb"cyclodextrin = mda.Universe(cyclodextrin_pdb)lignin = mda.Universe(lignin_pdb)# Calculate centers of masscyclodextrin_com = cyclodextrin.atoms.center_of_mass()oxygen_com = cyclodextrin.select_atoms("name O2 O3").center_of_mass()# Calculate CS vectorCS_vector = oxygen_com - cyclodextrin_com# Calculate HT vectorlignin_head = lignin.select_atoms("resid 1 and name C1 C2 C3 C4 C5 C6").center_of_mass()lignin_tail = lignin.select_atoms("resid 2 and name C1 C2 C3 C4 C5 C6").center_of_mass()HT_vector = lignin_head - lignin_tail# Shift HT vector to originate at cyclodextrin COMHT_vector_shifted = HT_vector + cyclodextrin_com# Calculate the cosine of the angle θcos_theta = np.dot(HT_vector, CS_vector) / (np.linalg.norm(HT_vector) * np.linalg.norm(CS_vector))theta = np.arccos(cos_theta)# Create py3Dmol visualizationview = py3Dmol.view(width=800, height=600)# Add the cyclodextrin molecule in CPK representationview.addModel(open(cyclodextrin_pdb, "r").read(), "pdb")view.setStyle({"model": 0}, {"sphere": {"scale": 0.3, "opacity": 0.5}, "stick": {"opacity": 0.5}})# Add the lignin molecule in CPK representationview.addModel(open(lignin_pdb, "r").read(), "pdb")view.setStyle({"model": 1}, {"sphere": {"scale": 0.3}, "stick": {}})# Add green arrows to represent HT and CS vectorsview.addArrow( {"start": {"x": float(cyclodextrin_com[0]),"y": float(cyclodextrin_com[1]),"z": float(cyclodextrin_com[2]), },"end": {"x": float(HT_vector_shifted[0]),"y": float(HT_vector_shifted[1]),"z": float(HT_vector_shifted[2]), },"radius": 0.2,"color": "green", })view.addArrow( {"start": {"x": float(cyclodextrin_com[0]),"y": float(cyclodextrin_com[1]),"z": float(cyclodextrin_com[2]), },"end": {"x": float(oxygen_com[0]), "y": float(oxygen_com[1]), "z": float(oxygen_com[2])},"radius": 0.2,"color": "green", })# Label HT, CS, and θlabel_offset = [0, 0, 1]view.addLabel("HT", {"position": {"x": float(HT_vector_shifted[0] + label_offset[0]),"y": float(HT_vector_shifted[1] + label_offset[1]),"z": float(HT_vector_shifted[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)label_offset = [-3, 0, 1]view.addLabel("CS", {"position": {"x": float(oxygen_com[0] + label_offset[0]),"y": float(oxygen_com[1] + label_offset[1]),"z": float(oxygen_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)label_offset = [0, 0, 1.6]view.addLabel("θ", {"position": {"x": float(cyclodextrin_com[0] + label_offset[0]),"y": float(cyclodextrin_com[1] + label_offset[1]),"z": float(cyclodextrin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label cyclodextrinlabel_offset = [-4, 0, 5.8]view.addLabel("cyclodextrin", {"position": {"x": float(cyclodextrin_com[0] + label_offset[0]),"y": float(cyclodextrin_com[1] + label_offset[1]),"z": float(cyclodextrin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Label ligninlignin_com = lignin.atoms.center_of_mass()label_offset = [-9, 0, -1]view.addLabel("lignin dimer 1", {"position": {"x": float(lignin_com[0] + label_offset[0]),"y": float(lignin_com[1] + label_offset[1]),"z": float(lignin_com[2] + label_offset[2]), },"backgroundOpacity": 0,"fontColor": "black","fontSize": 20, },)# Set the background color and zoom to fit the modelsview.setBackgroundColor("white")view.zoomTo()# Rotate for better view of θview.rotate(-90, "x")view.rotate(-30, "z")# Show the viewview.show()
3Dmol.js failed to load for some reason. Please check your browser console for error messages.
Figure 7: The angle \theta between \overrightarrow{HT} and \overrightarrow{CS} represented as a green arrow. The origin of \overrightarrow{HT} is shifted to the cyclodextrin COM. In the molecular structures, gray atoms are carbon, red atoms are oxygen, cyclodextrin is translucent, and hydrogen atoms are not shown.
HDBSCAN clustering
Determination of min_samples
The best value of min_samples = min_cluster_size used in the HDBSCAN algorithm was determined using the following algorithm:
Find the last value of min_samples, S, where the number of clusters corresponding to S+1 is larger than the number of clusters correponding to S. The number of clusters corresponding to S is C.
Choose the smallest value of min_samples > S where the number of clusters is equal to C.
Results
Number of clusters
The numbers of clusters as a function of min_samples are shown in Figure 8, Figure 9, and Figure 10. The best values for min_samples and the corresponding numbers of clusters were determined using the algorithm described in Determination of min_samples.
Figure 8: Number of clusters for dimer 1 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""The files related to this figure are in ./000_Attachments/fig-nclusters_GG.A singularity image files is ...This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_GG/min_samples.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 9: Number of clusters for dimer 2 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""The files related to this figure are in ./000_Attachments/fig-nclusters_TGG.A singularity image files is ...This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_TGG/min_samples.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 10: Number of clusters for dimer 3 as a function of min_samples for the HDBSCAN algorithm. The best value for min_samples and the corresponding number of clusters is circled in red and indicated in the legend.
Code
"""The files related to this figure are in ./000_Attachments/fig-nclusters_GG_BB.A singularity image files is ...This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-nclusters_GG_BB/min_samples.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Cluster scatter plots
The 3D scatter plots of the collective variables showing the HDBSCAN clusters for each dimer are shown in Figure 11, Figure 12, and Figure 13.
Code
"""The files related to this figure are in ./000_Attachments/fig-clusters_GG.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters-GG/clusters_configs.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)# Interactive 3D scatter plotimport plotly.io as pioimport json# Load the figure from the JSON filewithopen("./000_Attachments/fig-clusters_GG/clusters.json", 'r') as f: fig_data = json.load(f)# Create a figure from the loaded datafig = pio.from_json(json.dumps(fig_data))# Display the figurefig.show()
Figure 11: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 1. Representative configurations for each cluster are shown around the static scatter plot except for the cyan and brown clusters which include configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The cyclodextrin atoms and bonds are gray except for the primary face oxygen atoms are green and the secondary face oxygen atoms are purple. Atom colors in the lignin dimer molecules: H=white, C=cyan, O=red. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Code
"""The files related to this figure are in ./000_Attachments/fig-clusters_TGG.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters_TGG/clusters_configs.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)# Interactive 3D scatter plotimport plotly.io as pioimport json# Load the figure from the JSON filewithopen("./000_Attachments/fig-clusters_TGG/clusters.json", 'r') as f: fig_data = json.load(f)# Create a figure from the loaded datafig = pio.from_json(json.dumps(fig_data))# Display the figurefig.show()
Figure 12: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 2. Representative configurations for each cluster are shown around the static scatter plot except for the orange cluster which includes configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The cyclodextrin atoms and bonds are gray except for the primary face oxygen atoms are green and the secondary face oxygen atoms are purple. Atom colors in the lignin dimer molecules: H=white, C=cyan, O=red. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Code
"""The files related to this figure are in ./000_Attachments/fig-clusters_GG_BB.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-clusters-GG_BB/clusters_configs.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)# Interactive 3D scatter plotimport plotly.io as pioimport json# Load the figure from the JSON filewithopen("./000_Attachments/fig-clusters_GG_BB/clusters.json", 'r') as f: fig_data = json.load(f)# Create a figure from the loaded datafig = pio.from_json(json.dumps(fig_data))# Display the figurefig.show()
Figure 13: Interactive and static 3D scatter plots of the collective variables showing the HDBSCAN clusters for dimer 3. Representative configurations for each cluster are shown around the static scatter plot except for the purple cluster which includes configurations where the lignin dimer interacts (weakly) with the outside of the cyclodextrin ring or where there is no direct contact between the lignin dimer and cyclodextrin at all. The cyclodextrin atoms and bonds are gray except for the primary face oxygen atoms are green and the secondary face oxygen atoms are purple. Atom colors in the lignin dimer molecules: H=white, C=cyan, O=red. The arrows in the configurations point in the directions of \overrightarrow{CS} and \overrightarrow{HT}. HDBSCAN outlier points are not plotted.
Definition of cluster groups
For clusters containing configurations with a lignin dimer bound to the center of the cyclodextrin, clusters with similar binding configurations were grouped together. Those groups correspond to configurations that were visually observed in the trajectories. The cluster groups are described below in terms of the normal distances from cylodextrin COM to the lignin dimer COM, head, and tail. This is probably more intuitive than using a combination of d_norm and \cos \theta. The cyclodextrin center COM to lignin dimer COM normal distance is d_{norm} and the cyclodextrin COM to lignin dimer head and tail normal distances are defined analagous to d_{norm} in Equation 1.
Dimer 1 cluster groups
For dimer 1, the cluster groups were named head-sec, center-sec, and tail-sec. The head-sec group consists of clusters where the lignin dimer head distance is usually less than the lignin dimer tail and COM distances and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). The center-sec group consists of clusters where the lignin dimer COM, head, and tail are about the same distance from the cyclodextrin COM and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). The tail-sec group consists of clusters where the lignin dimer tail distance is usually less than the lignin dimer head and COM distances and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). Kernel density estimate (KDE) plots of the cyclodextrin COM to lignin dimer normal distances (COM, head, tail) for each cluster group are shown in Figure 14, Figure 15, and Figure 16.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_GG-head-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG-head-sec/cluster_distances_KDE_head-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 14: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 1 for the head-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_GG-center-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG-center-sec/cluster_distances_KDE_center-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 15: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 1 for the center-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_GG-tail-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG-tail-sec/cluster_distances_KDE_tail-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 16: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 1 for the tail-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Dimer 2 cluster groups
For dimer 2, the cluster groups were named head-sec, center-sec, tail-sec, and tail-pri. The head-sec, center-sec, tail-sec groups were defined analogously to dimer 1. The tail-pri group consists of clusters where the lignin dimer tail distance is usually greater than the lignin dimer head and COM distances and the COM of the lignin dimer is closer to the cyclodextrin primary face than the cyclodextrin secondary face (negative distance). KDE plots of the cyclodextrin COM to lignin dimer normal distances (COM, head, tail) for each cluster group are shown in Figure 17, Figure 18, Figure 19, and Figure 20.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_TGG-head-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-head-sec/cluster_distances_KDE_head-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 17: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the head-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_TGG-center-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-center-sec/cluster_distances_KDE_center-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 18: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the center-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_TGG-tail-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-tail-sec/cluster_distances_KDE_tail-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 19: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the tail-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_TGG-tail-pri.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_TGG-tail-pri/cluster_distances_KDE_tail-pri.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 20: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 2 for the tail-pri cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Dimer 3 cluster groups
For dimer 3, the cluster groups were named end-sec and center. The end-sec group consists of clusters where the distance for one of the lignin dimer ends (molecule is symmetric) is usually less than the distance for the lignin dimer end and the lignin dimer COMs and the COM of the lignin dimer is closer to the cyclodextrin secondary face than the cyclodextrin primary face (positive distance). The center group consists of clusters where the lignin dimer COM is located near the cyclodextrin. KDE plots of the cyclodextrin COM to lignin dimer normal distances (COM, ends) for each cluster group are shown in Figure 21 and Figure 22.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_GG_BB-end-sec.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG_BB-end-sec/cluster_distances_KDE_end-sec.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 21: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 3 for the center-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Code
"""The files related to this figure are in ./000_Attachments/fig-kde_GG_BB-center.This figure was generated using snakemake in the input directory.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""import subprocessimport osfrom pathlib import Path# Current directorycurrent_dir = Path().resolve()# Change directoryinput_dir = Path("../../input").resolve()os.chdir(input_dir)# Run the snakemake commandsubprocess.run( ["snakemake","../writing/HDBSCAN_clustering_for_identification_of_states_with_lignin_dimers_bound_to_cyclodextrin_from_molecular_dynamics_simulations/000_Attachments/fig-kde_GG_BB-center/cluster_distances_KDE_center.png","-j", ], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL,)# Change directory backos.chdir(current_dir)
Figure 22: KDE plots of the cyclodextrin to lignin dimer normal distances for dimer 3 for the end-sec cluster group. In the legend, COM refers to the cyclodextrin COM to lignin dimer COM distance, head refers to the cyclodextrin COM to lignin dimer head distance, and tail refers to the cyclodextrin COM to lignin dimer tail distance.
Proportion of configurations in each cluster
For clusters and cluster groups containing configurations with a lignin dimer bound to the center of the cyclodextrin, the proportions of those configurations belonging to each cluster or cluster group were calculated. The proportions are shown in Table 1, Table 2, and Table 3.
Code
"""The data for this table is in ./000_Attachments/tbl-fractions_GG/fractions.csv.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from IPython.display import display, HTMLimport pandas as pd# Read CSV file into a DataFramedf = pd.read_csv("./000_Attachments/tbl-fractions_GG/fractions.csv")# Convert DataFrame to HTML without the indexhtml_table = df.to_html(index=False)# Display the HTML table in the notebookdisplay(HTML(html_table))
Table 1: Proportions of center-bound configurations belonging to each cluster and cluster group for dimer 1. Refer to Figure 11 to see the clusters and configurations corresponding to the label numbers. The head-sec, center-sec, and tail-sec labels refer to cluster groups (see Dimer 1 cluster groups). The clusters in each cluster group are listed in the rows above them up until another cluster group label is encountered or the top of the table.
Label
Fraction
4
0.0020
5
0.0060
6
0.5051
7
0.0054
8
0.0067
head:sec
0.5253
2
0.0571
center:sec
0.0571
3
0.4176
tail:sec
0.4176
There are 3 cluster groups defined for dimer 1: head-sec, center-sec, and tail-sec (see Dimer 1 cluster groups). The head-sec group consists of 5 clusters. However, most of the configurations belong to cluster 6. Clusters 4, 5, 7 and 8 account for a total of about 2.0% of all configurations, while cluster 6 has about 50.5% of all configurations. The center-sec and tail-sec groups each consist of a single cluster with about 5.7% and 41.8% of all configurations, respectively.
Code
"""The data for this table is in ./000_Attachments/tbl-fractions_TGG/fractions.csv.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from IPython.display import display, HTMLimport pandas as pd# Read CSV file into a DataFramedf = pd.read_csv("./000_Attachments/tbl-fractions_TGG/fractions.csv")# Convert DataFrame to HTML without the indexhtml_table = df.to_html(index=False)# Display the HTML table in the notebookdisplay(HTML(html_table))
Table 2: Proportions of center-bound configurations belonging to each cluster and cluster group for dimer 2. Refer to Figure 12 to see the clusters and configurations corresponding to the label numbers. The head-sec, center-sec, tail-sec, and tail-pri labels refer to cluster groups (see Dimer 2 cluster groups). The clusters in each cluster group are listed in the rows above them up until another cluster group label is encountered or the top of the table.
Label
Fraction
4
0.0229
5
0.7153
head:sec
0.7382
3
0.0280
center:sec
0.0280
2
0.2300
tail:sec
0.2300
1
0.0037
tail:pri
0.0037
There are 4 cluster groups defined for dimer 2: head-sec, center-sec, tail-sec, and tail-pri (see Dimer 2 cluster groups). The head-sec group consists of 2 clusters. However, most of the configurations belong to cluster 5. Cluster 4 accounts for about 2.3% of all configurations, while cluster 5 has about 71.5% of all configurations. The center-sec, tail-sec, and tail-pri groups consist of a single clusters with about 2.8%, 23.0%, and 0.4% of all configurations, respectively.
Code
"""The data for this table is in ./000_Attachments/tbl-fractions_GG_BB/fractions.csv.A conda environment file, bcd_ligin.yml, is in ./000_Attachments."""from IPython.display import display, HTMLimport pandas as pd# Read CSV file into a DataFramedf = pd.read_csv("./000_Attachments/tbl-fractions_GG_BB/fractions.csv")# Convert DataFrame to HTML without the indexhtml_table = df.to_html(index=False)# Display the HTML table in the notebookdisplay(HTML(html_table))
Table 3: Proportions of center-bound configurations belonging to each cluster and cluster group for dimer 3. Refer to Figure 13 to see the clusters and configurations corresponding to the label numbers. The end-sec and center labels refer to cluster groups (see Dimer 3 cluster groups). The clusters in each cluster group are listed in the rows above them up until another cluster group label is encountered or the top of the table.
Label
Fraction
1
0.0717
3
0.0027
end:sec
0.0744
2
0.9256
center
0.9256
There are 2 cluster groups defined for dimer 3: end-sec and center (see Dimer 3 cluster groups). The end-sec group consists of 2 clusters. However, most of the configurations belong to cluster 1. Cluster 3 accounts for about 0.3% of all configurations, while cluster 5 has about 7.2% of all configurations. The center group consists of a single cluster with about 92.6% of all configurations.
Comparison to previous work
dd
Conclusions
sdf
Future work
fwe
References
(1)
Dean, K. R.; Novak, B.; Moradipour, M.; Tong, X.; Moldovan, D.; Knutson, B. L.; Rankin, S. E.; Lynn, B. C. Complexation of Lignin Dimers with β-Cyclodextrin and Binding Stability Analysis by ESI-MS, Isothermal Titration Calorimetry, and Molecular Dynamics Simulations. J. Phys. Chem. B2022, 126 (8), 1655–1667. https://doi.org/10.1021/acs.jpcb.1c09190.
(2)
Kutzner, C.; Páll, S.; Fechner, M.; Esztermann, A.; de Groot, B. L.; Grubmüller, H. More Bang for Your Buck: Improved Use of GPU Nodes for GROMACS 2018. J. Comput. Chem.2019, 40 (27), 2418–2431. https://doi.org/10.1002/jcc.26011.
Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model.2021, 61 (8), 3891–3898. https://doi.org/10.1021/acs.jcim.1c00203.
(5)
Trott, O.; Olson, A. J. AutoDock Vina: Improving the Speed and Accuracy of Docking with a New Scoring Function, Efficient Optimization, and Multithreading. J. Comput. Chem.2010, 31 (2), 455–461. https://doi.org/10.1002/jcc.21334.
Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining; KDD’96; AAAI Press: Portland, Oregon, 1996; pp 226–231.
Campello, R. J. G. B.; Moulavi, D.; Zimek, A.; Sander, J. Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection. ACM Trans. Knowl. Discov. Data2015, 10 (1), 5:1–5:51. https://doi.org/10.1145/2733381.
Tribello, G. A.; Bonomi, M.; Branduardi, D.; Camilloni, C.; Bussi, G. PLUMED 2: New Feathers for an Old Bird. Comput. Phys. Commun.2014, 185 (2), 604–613. https://doi.org/10.1016/j.cpc.2013.09.018.